Fast Development of Basic NLP Tools: Towards a Lexicon and a POS Tagger for Kurmanji Kurdish

نویسندگان

  • Géraldine Walther
  • Benoît Sagot
  • Karën Fort
چکیده

The development of basic NLP resources for minority languages is still a challenge to both formal and computational linguists. In this paper, we show how we were able to develop a medium-scale morphological lexicon for Kurmanji Kurdish in a few days time using only freely accessible resources. We also developed a preliminary POS tagger that shall be used as a pre-annotation tool for developing a POS-annotated corpus, based solely on raw text and on our morphological lexicon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy

Like most of the languages which have only recently started being investigated for the Natural Language Processing (NLP) tasks, Amazigh lacks annotated corpora and tools and still suffers from the scarcity of linguistic tools and resources. The main aim of this paper is to present a new part-of-speech (POS) tagger based on a new Amazigh tag set (AMTS) composed of 28 tags. In line with our goal ...

متن کامل

dTagger: A POS Tagger

The Lexical Systems Group at the National Library of Medicine (NLM) has developed a Part-of-Speech (POS) tagger to be freely distributed with the SPECIALIST NLP Tools. dTagger is specifically designed for use with the SPECIALIST lexicon but it can be used with an arbitrary tag set. It is capable of single or multi-word chunking. It is trainable with previously annotated text and in development ...

متن کامل

Lexicon Acquisition for Dialectal Arabic Using Transductive Learning

We investigate the problem of learning a part-of-speech (POS) lexicon for a resource-poor language, dialectal Arabic. Developing a high-quality lexicon is often the first step towards building a POS tagger, which is in turn the front-end to many NLP systems. We frame the lexicon acquisition problem as a transductive learning problem, and perform comparisons on three transductive algorithms: Tra...

متن کامل

A Dependency Treebank for Kurmanji Kurdish

This paper describes the development of the first syntactically annotated corpus of Kurmanji Kurdish. The corpus was used as one of the surprise languages in the 2017 CoNLL shared task on parsing Universal Dependencies. In the paper we describe how the corpus was prepared, some Kurmanji specific constructions that required special treatment, and we give results for parsing Kurdish using two pop...

متن کامل

Percentage of Consonants Correct for 3-5 Years Old Kurdish-Speaking Children With Middle Kurmanji-Mukryani Dialect

Objectives: The present research aims to study the normal development of Percentage of Consonant Correct (PCC) in Kurdish-speaking children, with Middle Kurmanji-Mukryani Dialect as an Articulation Competency Index (ACI). PCC was examined in terms of the manner of articulation and position of sound in the word.  Methods: In this descriptoanalytical cross-sectional study, 120 Kurdish-speak...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010